57 research outputs found
Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks
Uncertainty quantification (UQ) is important for reliability assessment and
enhancement of machine learning models. In deep learning, uncertainties arise
not only from data, but also from the training procedure that often injects
substantial noises and biases. These hinder the attainment of statistical
guarantees and, moreover, impose computational challenges on UQ due to the need
for repeated network retraining. Building upon the recent neural tangent kernel
theory, we create statistically guaranteed schemes to principally
\emph{quantify}, and \emph{remove}, the procedural uncertainty of
over-parameterized neural networks with very low computation effort. In
particular, our approach, based on what we call a procedural-noise-correcting
(PNC) predictor, removes the procedural uncertainty by using only \emph{one}
auxiliary network that is trained on a suitably labeled data set, instead of
many retrained networks employed in deep ensembles. Moreover, by combining our
PNC predictor with suitable light-computation resampling methods, we build
several approaches to construct asymptotically exact-coverage confidence
intervals using as low as four trained networks without additional overheads
Quantifying Epistemic Uncertainty in Deep Learning
Uncertainty quantification is at the core of the reliability and robustness
of machine learning. In this paper, we provide a theoretical framework to
dissect the uncertainty, especially the epistemic component, in deep learning
into procedural variability (from the training procedure) and data variability
(from the training data), which is the first such attempt in the literature to
our best knowledge. We then propose two approaches to estimate these
uncertainties, one based on influence function and one on batching. We
demonstrate how our approaches overcome the computational difficulties in
applying classical statistical methods. Experimental evaluations on multiple
problem settings corroborate our theory and illustrate how our framework and
estimation can provide direct guidance on modeling and data collection effort
to improve deep learning performance
Self-Aligned Concave Curve: Illumination Enhancement for Unsupervised Adaptation
Low light conditions not only degrade human visual experience, but also
reduce the performance of downstream machine analytics. Although many works
have been designed for low-light enhancement or domain adaptive machine
analytics, the former considers less on high-level vision, while the latter
neglects the potential of image-level signal adjustment. How to restore
underexposed images/videos from the perspective of machine vision has long been
overlooked. In this paper, we are the first to propose a learnable illumination
enhancement model for high-level vision. Inspired by real camera response
functions, we assume that the illumination enhancement function should be a
concave curve, and propose to satisfy this concavity through discrete integral.
With the intention of adapting illumination from the perspective of machine
vision without task-specific annotated data, we design an asymmetric
cross-domain self-supervised training strategy. Our model architecture and
training designs mutually benefit each other, forming a powerful unsupervised
normal-to-low light adaptation framework. Comprehensive experiments demonstrate
that our method surpasses existing low-light enhancement and adaptation methods
and shows superior generalization on various low-light vision tasks, including
classification, detection, action recognition, and optical flow estimation.
Project website: https://daooshee.github.io/SACC-Website/Comment: This paper has been accepted by ACM Multimedia 202
Attentive Symmetric Autoencoder for Brain MRI Segmentation
Self-supervised learning methods based on image patch reconstruction have
witnessed great success in training auto-encoders, whose pre-trained weights
can be transferred to fine-tune other downstream tasks of image understanding.
However, existing methods seldom study the various importance of reconstructed
patches and the symmetry of anatomical structures, when they are applied to 3D
medical images. In this paper we propose a novel Attentive Symmetric
Auto-encoder (ASA) based on Vision Transformer (ViT) for 3D brain MRI
segmentation tasks. We conjecture that forcing the auto-encoder to recover
informative image regions can harvest more discriminative representations, than
to recover smooth image patches. Then we adopt a gradient based metric to
estimate the importance of each image patch. In the pre-training stage, the
proposed auto-encoder pays more attention to reconstruct the informative
patches according to the gradient metrics. Moreover, we resort to the prior of
brain structures and develop a Symmetric Position Encoding (SPE) method to
better exploit the correlations between long-range but spatially symmetric
regions to obtain effective features. Experimental results show that our
proposed attentive symmetric auto-encoder outperforms the state-of-the-art
self-supervised learning methods and medical image segmentation models on three
brain MRI segmentation benchmarks.Comment: MICCAI 2022, code:https://github.com/lhaof/AS
Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection
Multi-class cell nuclei detection is a fundamental prerequisite in the
diagnosis of histopathology. It is critical to efficiently locate and identify
cells with diverse morphology and distributions in digital pathological images.
Most existing methods take complex intermediate representations as learning
targets and rely on inflexible post-refinements while paying less attention to
various cell density and fields of view. In this paper, we propose a novel
Affine-Consistent Transformer (AC-Former), which directly yields a sequence of
nucleus positions and is trained collaboratively through two sub-networks, a
global and a local network. The local branch learns to infer distorted input
images of smaller scales while the global network outputs the large-scale
predictions as extra supervision signals. We further introduce an Adaptive
Affine Transformer (AAT) module, which can automatically learn the key spatial
transformations to warp original images for local network training. The AAT
module works by learning to capture the transformed image regions that are more
valuable for training the model. Experimental results demonstrate that the
proposed method significantly outperforms existing state-of-the-art algorithms
on various benchmarks.Comment: ICCV 2023, released code: https://github.com/lhaof/ACForme
Prompt-based Grouping Transformer for Nucleus Detection and Classification
Automatic nuclei detection and classification can produce effective
information for disease diagnosis. Most existing methods classify nuclei
independently or do not make full use of the semantic similarity between nuclei
and their grouping features. In this paper, we propose a novel end-to-end
nuclei detection and classification framework based on a grouping
transformer-based classifier. The nuclei classifier learns and updates the
representations of nuclei groups and categories via hierarchically grouping the
nucleus embeddings. Then the cell types are predicted with the pairwise
correlations between categorical embeddings and nucleus features. For the
efficiency of the fully transformer-based framework, we take the nucleus group
embeddings as the input prompts of backbone, which helps harvest grouping
guided features by tuning only the prompts instead of the whole backbone.
Experimental results show that the proposed method significantly outperforms
the existing models on three datasets.Comment: MICCAI 2023, released code: https://github.com/lhaof/PG
QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking
Similarity learning has been recognized as a crucial step for object
tracking. However, existing multiple object tracking methods only use sparse
ground truth matching as the training objective, while ignoring the majority of
the informative regions in images. In this paper, we present Quasi-Dense
Similarity Learning, which densely samples hundreds of object regions on a pair
of images for contrastive learning. We combine this similarity learning with
multiple existing object detectors to build Quasi-Dense Tracking (QDTrack),
which does not require displacement regression or motion priors. We find that
the resulting distinctive feature space admits a simple nearest neighbor search
at inference time for object association. In addition, we show that our
similarity learning scheme is not limited to video data, but can learn
effective instance similarity even from static input, enabling a competitive
tracking performance without training on videos or using tracking supervision.
We conduct extensive experiments on a wide variety of popular MOT benchmarks.
We find that, despite its simplicity, QDTrack rivals the performance of
state-of-the-art tracking methods on all benchmarks and sets a new
state-of-the-art on the large-scale BDD100K MOT benchmark, while introducing
negligible computational overhead to the detector
Push the Boundary of SAM: A Pseudo-label Correction Framework for Medical Segmentation
Segment anything model (SAM) has emerged as the leading approach for
zero-shot learning in segmentation, offering the advantage of avoiding
pixel-wise annotation. It is particularly appealing in medical image
segmentation where annotation is laborious and expertise-demanding. However,
the direct application of SAM often yields inferior results compared to
conventional fully supervised segmentation networks. While using SAM generated
pseudo label could also benefit the training of fully supervised segmentation,
the performance is limited by the quality of pseudo labels. In this paper, we
propose a novel label corruption to push the boundary of SAM-based
segmentation. Our model utilizes a novel noise detection module to distinguish
between noisy labels from clean labels. This enables us to correct the noisy
labels using an uncertainty-based self-correction module, thereby enriching the
clean training set. Finally, we retrain the network with updated labels to
optimize its weights for future predictions. One key advantage of our model is
its ability to train deep networks using SAM-generated pseudo labels without
relying on a subset of expert-level annotations. We demonstrate the
effectiveness of our proposed model on both X-ray and lung CT datasets,
indicating its ability to improve segmentation accuracy and outperform baseline
methods in label correction
- …